The dataset we are using comes from UCI Machine Learning repository. The dataset is called “Online Retail” and can be found in http://archive.ics.uci.edu/ml/datasets/online+retail.
import function as f
#read dataset:
retail_db = f.read_dataset()
retail_db.head()
After pre-processing, the dataset includes 379.979 records and 17 fields:
invoice_id, stock_code,
description, quantity,
invoice_date, unit_price,
customer_id, country,
order_status, sales(£),
sales(£)_abs, quantity_abs,
time, hour, date, month, year
#data pre-processing:
RETAIL = f.data_pre_processing(retail_db)
RETAIL.head()
With the pre-processing data let's go deep into the store behavior during the year 2011.
#finance and sales report:
FINANCE_REPORT, SALES_REPORT = f.finance_and_sales_report(RETAIL)
#customer and product behaviors:
COUNTRY_TOP10, TIME, RETURNED, REPURCHASE_RATE, PROD_TOP10, SALES_TOP10, ITEM_INVOICE = f.customer_and_product_behavior(RETAIL)
#Charts:
SALES_CHT_LINE, TIME_HIST, SALES_TOP10_CHT_BAR, PROD_TOP10_CHT_BAR, ITEM_INV_HIST = f.charts(SALES_REPORT, TIME, SALES_TOP10, PROD_TOP10,ITEM_INVOICE)
This is a transnational data set which contains all the transactions occurring between 01/12/2010 and 09/12/2011 for a UK-based and registered non-store online retail.
The company mainly sells unique all-occasion gifts. Many customers of the company are wholesalers.
The Year of 2011, this store has 8.9 M£ in revenue and sold almost 5M in products. Presented in this year an average ticket of 23.51£.
Below is possible to see the financial report.
FINANCE_REPORT
In a way to study the financial behavior of this company, it was possible to separate the revenue and products volumes along the year month by month.
Comparing the period of September to November it is possible to identify that revenue and volumes increased almost 50% compared to the first semester. This behavior might be justified by the end-year parties and the promotional dates such as Black Friday in November. An indication of that might be the sales decrease in December.
SALES_CHT_LINE
As online retail, the stores have registered sales in 36 countries. Although the store sells worldwide, its main market is the United Kingdom responsible for 81% (approx £6M) of sales in 2011.
COUNTRY_TOP10
There are many ways to validate customer fidelity, one of which is the repurchase rate.
The repurchase rate of that store is 98.23% that means that the customers return to buy after the first time.
as we can see with a 2.16% of the invoices with status canceled (products returns) the satisfaction with the products is high also so, probably, the main reason for these returns is for give-up or it didn't attend the customers expectations.
print('Repurchases rate {}%.'.format(REPURCHASE_RATE).upper())
RETURNED
Now that we know most of the customers as fidelity to the store we can analyze the top 10 products behaviors, in what time the customers used to buy, and how many items are ordered by invoices. This kind of information could give some insight to raise the sales.
There is a bias between the hour of the day and order volume. With the histogram, we could conclude that most orders happened between 10:00 - 15:00. So if the company is going to do a promotional voucher, probably, must avoid this time and do during the time people don't use to buy such as between 6 AM - 9 AM or 5 PM - 7 PM to increase the number of orders during this hours.
TIME_HIST
Invoices with C in the register are considered a cancel order. so we consider it as a return. We could conclude that people mostly purchase less than 10 items in each invoice
ITEM_INV_HIST
below it is possible to see the top 10 products that brought more revenue and also the top 10 products most sold.
SALES_TOP10_CHT_BAR
PROD_TOP10_CHT_BAR